Revert "fix(trtllm_mha): clamp page_table to k_cache page range"#7
Open
pyc96 wants to merge 1 commit into
Open
Revert "fix(trtllm_mha): clamp page_table to k_cache page range"#7pyc96 wants to merge 1 commit into
pyc96 wants to merge 1 commit into
Conversation
This reverts commit 5547e41. PR #5 (the clamp) is no longer needed because PR #6 (Patch E) eliminates the source of OOB page_table values entirely. The clamp's only side-effect was a known quality limitation -- when the clamp actually triggered, it replaced an OOB page index with the LAST valid SWA page, producing slightly wrong attention values for that position and lowering MTP draft acceptance. With Patch E in place those OOB values never occur and the clamp never fires, so it's dead code that adds one .clamp() per kernel call for no benefit. Verified after this revert (Gemma-4-E4B-IT + trtllm_mha + MTP + summarization 8 k/1 k x 80 on 1x B200): outcome: OK (zero trap events from PR #3 debug) accept length: matches the pre-revert PR #6 run TPOT: matches the pre-revert PR #6 run If a future code change reintroduces an OOB page_table value, the opt-in bounds-check trap from PR #3 (SGLANG_TRTLLM_MHA_DEBUG=1) will still catch it with a deterministic Python exception + dump for triage. Co-authored-by: Claude
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Revert the defensive clamp from #5. It is no longer
needed now that #6 (Patch E) eliminates the source of OOB
page_table values entirely.
Stacked on #6.
Why revert
The clamp was a safety net that traded correctness for liveness:
when it actually triggered, it replaced an OOB page index with the
LAST valid SWA page, producing slightly wrong attention values for
that position and lowering MTP draft acceptance (1.67 instead of
2.84 on 26B summarization).
Patch E (#6) fixes the actual root cause — the
FROZEN_KV_MTP draft backend not swapping its SWA-aware attributes
when
token_to_kv_poolis swapped to the target's SWA pool — soOOB values never occur and the clamp never fires. It's now dead
code that adds one
.clamp()per kernel call for no benefit.Verification (Gemma-4-E4B-IT, trtllm_mha, MTP, summarization 8 k/1 k × 80, 1× B200)
Same results. The clamp truly was dead code.
Safety net
If a future code change ever reintroduces an OOB page_table value,
the opt-in bounds-check trap from #3
(
SGLANG_TRTLLM_MHA_DEBUG=1) will still catch it with adeterministic Python exception + dump for triage.
CI States
Latest PR Test (Base): ❌ Missing
run-cilabel -- add it to run CI tests.Latest PR Test (Extra): ❌ Blocked --
run-ciis required first.